Research for Uyghur-Chinese Neural Machine Translation
نویسندگان
چکیده
The problem of rare and unknown words is an important issue in Uyghur-Chinese machine translation, especially using neural machine translation model. We propose a novel way to deal with the rare and unknown words. Based on neural machine translation of using pointers over input sequence, our approach which consists of preprocess and post-process can be used in all neural machine translation model. Pre-process modify the Uyghur-Chinese corpus to extend the ability of pointer network, and the postprocess retranslating the raw translation by a phrase-based machine translation model or a wordlist. Experiment show that neural machine translation model used the approach proposed by this paper get a higher BLEU score than the phrase-based model in Uyghur-Chinese MT.
منابع مشابه
Factor-Based Uyghur-Chinese Statistical Machine Translation
This paper is an initial explore to Uyghur-Chinese statistical machine translation. Uyghur and Chinese are very different from each other, the former is an agglutinative language with very productive inflectional and derivational word-formation processes, but the characters of the latter are almost hieroglyphics, morpheme processing doesn’t work at all. We integrate Uyghur additional informatio...
متن کاملA Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation
In statistical machine translation, large amount of unreasonable phrase pairs in a phrase table can affect the decoding efficiency and the overall translation performance, especially in Uyghur-Chinese machine translation. In this paper, we present a novel phrase table filtering model based on binary classification, which consider differences between Uyghur and Chinese, and draw lessons from bin...
متن کاملUyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition
This thesis studies the disambiguation method in Uyghur-Chinese translation, and proposes the design philosophy of automatic-acquisition in translation label library aiming at the deficiency of disambiguation corpus in Uyghur. It refers to the existing Uyghur-Chinese bilingual dictionary, Chinese corpus and the Internet, and acquires the corresponding Chinese translation label examples to Uyghu...
متن کاملRule Based Analysis of the Uyghur Nouns
This paper describes the implementation of a rule-based analyzer for Uyghur (spoken in Sin Kiang, China) Nouns. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has productive inflectional and derivational suffixes. In...
متن کاملLog-linear Models for Uyghur Segmentation in Spoken Language Translation
To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random fiel...
متن کامل